汎用プロンプトの限界を越えて

ファインチューニングと専門的アーキテクチャによる最適化

「Few-Shot」プロンプティングは強力な出発点ですが、AIソリューションのスケーラビリティを高めるには、しばしば教師ありファインチューニングに移行する必要があります。このプロセスでは、特定の知識や行動パターンがモデルの重みに直接組み込まれます。

判断基準： 応答品質の向上とトークンコストの削減が、必要なコンピューティングおよびデータ準備の労力よりも大きい場合にのみ、ファインチューニングを行うべきです。

$コスト = トークン数 \times 単価$

小規模言語モデル（SLM）は、巨大なモデルの高効率かつ小型化されたバージョン（例：Phi-3.5、Mistral Small）であり、高度に選別された高品質なデータで訓練されています。

トレードオフ： SLMは著しく低いレイテンシを実現し、エッジデプロイメント（デバイス上でローカルに実行）を可能にしますが、巨大なLLMに見られる広範な一般化された「人間らしい」知能は犠牲になります。

エキスパートの混合（MoE）：推論時に計算効率を維持しながら、モデル全体のサイズを拡張する技術です。特定のトークンに対しては、「エキスパート」の一部のみがアクティブになります（例：Phi-3.5-MoE）。
マルチモダリティ：テキスト、画像、時折音声を同時に処理できるように設計されたアーキテクチャであり、テキスト生成を超えた用途を拡大します（例：Llama 3.2）。

効率の階層

常に最初に試すべきは プロンプトエンジニアリング です。失敗した場合は、 RAG（検索増強生成） を導入してください。ファインチューニングは最終段階の高度な最適化手段としてのみ使用すべきです。 ファインチューニング 最終段階の高度な最適化手段としてのみ使用すべきです。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Question 1

When does the course recommend proceeding with fine-tuning over prompt engineering?

When the benefits in quality and cost (reduced token usage) outweigh compute effort.

Whenever you need the model to sound more human-like.

As the very first step before trying RAG or prompt engineering.

Only when deploying to an edge device.

Question 2

Which model architecture allows scaling model size while maintaining computational efficiency?

Supervised Fine-Tuning (SFT)

Retrieval-Augmented Generation (RAG)

Mixture of Experts (MoE)

Multimodality

Challenge: Edge Deployment Strategy

Apply your knowledge to a real-world scenario.

You need to deploy a multilingual translation tool that runs locally on a laptop with limited GPU resources.

Task 1

Select the appropriate model family and tokenizer for this multilingual, low-resource task.

Solution:
Mistral NeMo with the Tekken Tokenizer. It is optimized for multilingual text and fits within SLM constraints.

Task 2

Define the deployment framework for high-performance local inference.

Solution:
Use ONNX Runtime or Ollama for local execution to maximize hardware acceleration on the laptop.